Clustering Large Data Sets Described With Discrete Distributions and An Application on TIMSS Data Set
نویسندگان
چکیده
Symbolic Data Analysis is based on a special descriptions of data – symbolic objects. Such descriptions preserve more detailed information about data than the usual representations with mean values. A special kind of symbolic object is also representation with distributions. In the clustering process this representation enables us to consider the variables of all types at the same time. We present two clustering methods based on the data descriptions with discrete distributions: the adapted leaders method and the adapted agglomerative hierarchical clustering Ward’s method. Both methods are compatible – they can be viewed as two approaches for solving the same clustering optimization problem. In the obtained clustering to each cluster is assigned its leader. The descriptions of the leaders offer simple interpretation of the clusters’ characteristics. The leaders method enables us to efficiently solve clustering problems with large number of units; while the agglomerative method is applied on the obtained leaders and enables us to decide upon the right number of clusters on the basis of the corresponding dendrogram. University of Ljubljana, Faculty of Economics, Department of Statistics, [email protected] (corresponding author) University of Ljubljana, Faculty of Mathematics and Physics, Department of Mathematics, [email protected] The Educational Research Institute, Slovenia, [email protected]
منابع مشابه
Clustering large data sets described with discrete distributions and its application on TIMSS data set
Symbolic Data Analysis is based on a special descriptions of data – symbolic objects. Such descriptions preserve more detailed information about data than the standard representations with mean values. A special kind of symbolic object is also representation with distributions. In the clustering process this representation enables us to consider the variables of all types at the same time. We p...
متن کاملA new approach for data visualization problem
Data visualization is the process of transforming data, information, and knowledge into visual form, making use of humans’ natural visual capabilities which reveals relationships in data sets that are not evident from the raw data, by using mathematical techniques to reduce the number of dimensions in the data set while preserving the relevant inherent properties. In this paper, we formulated d...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملBank efficiency evaluation using a neural network-DEA method
In the present time, evaluating the performance of banks is one of the important subjects for societies and the bank managers who want to expand the scope of their operation. One of the non-parametric approaches for evaluating efficiency is data envelopment analysis(DEA). By a mathematical programming model, DEA provides an estimation of efficiency surfaces. A major problem faced by DEA is that...
متن کاملApplication of modified balanced iterative reducing and clustering using hierarchies algorithm in parceling of brain performance using fMRI data
Introduction: Clustering of human brain is a very useful tool for diagnosis, treatment, and tracking of brain tumors. There are several methods in this category in order to do this. In this study, modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) was introduced for brain activation clustering. This algorithm has an appropriate speed and good scalability in dealing ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010